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Abstract 



o 

f^**) The notation of mutually unbiased bases(MUB) was first introduced 

£N| by Ivanovic to reconstruct density matrixes 10 . The subject about how 

q to use MUB to analyze, process, and utilize the information of the second 

moments between random variables is studied in this paper. In the first 
part, the mathematical foundation will be built. It will be shown that the 
spectra of MUB have complete information for the correlation matrixes of 
finite discrete signals, and the nice properties of them. Roughly speaking, 
it will be shown that each spectrum from MUB plays an equal role for finite 
discrete signals, and the effect between any two spectra can be treated 
as a global constant shift. These properties will be used to find some 

HH important and natural characterizations of random vectors and random 

ryj discrete operators/filters. For a technical reason, it will be shown that any 

O MUB spectra can be found as fast as Fourier spectrum when the length 

of the signal is a prime number. 

In the second part, some applications will be presented. First of all, 
a protocol about how to increase the number of users in a basic digital 
communication model will be studied, which has bring some deep insights 

£■ — , about how to encode the information into the second moments between 

random variables. Secondly, the application of signal analysis will be 

04 studied. It is suggested that complete "MUB" spectra analysis works well 

in any case, and people can just choose the spectra they are interested 
in to do analysis. For instance, single Fourier spectra analysis can be 
— , also applied in nonstationary case. Finally, the application of MUB in 

dimensionality reduction will be considered, when the prior knowledge of 
the data isn't reliable. 



INDEX TERMS: Mutually Unbiased bases, Second Moment, Correlation Matrix, Dig- 
ital Communication, Signal Processing, Dimensionality Reduction 



I. INTRODUCTION 



Ivanovic first introduced mutually unbiased bases(MUB) to reconstruct density 
matrixes [TU] : 
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Definition 1 Let M v — {v\,V2, ...Vd} , M u — {v\,V2, ■■■Vd} be two normalized 
orthogonal bases in the d dimension complex space. They are said to be mutually 
unbiased bases if and only if\ < Vi, Uj > | = for any i,j = 1, 2, d. A set of 
normalized orthogonal bases {M\, M2, M n } are said to be mutually unbiased 
bases if and only if each pair of bases Mi and Mj are mutually unbiased bases. 

MUB is widely used in the areas of quantum physics and quantum informa- 
tion theory, such as the reconstruction of pre-state|12j. tomography, Wigner 
distribution [7J, teleportation[6], and quantum cryptograph [21|3j|4]. But it has 
only a few classical application such as [21]. This is quite reasonable, because 
do full MUB spectra analysis need d+l times time and space resources where d 
equals the length of signals. But it should be noticed that bases from MUB has 
natural connections with the Fourier base which has plenty of applications, |17j 
has done some study about it. Intuitively, the relation between any two bases 
from MUB is the same as that between the standard bases and the Fourier bases 
if we only concern the inner products of the vectors. 

One of the major subjects in this area is to construct MUB for a given 
dimension d. It's known that, there are no more than d+l MUB for dimension d, 
and when d is the power of prime, all d+l MUB can be explicitly constructed [12 . 
This paper only focuses on the case when d+ 1 MUB can be found for dimension 
d, and will not study the construction of the. It will be introduced, in Sections 
II — IV, some mathematical foundations. Then the paper will present some 
interesting applications of these results in Sections V — VII. 

In Section II, the equivalence between autocorrelation matrix and the spec- 
tra of mutually unbiased based will be formally presented. Some interesting 
properties concerning what kinds of spectra can form autocorrelation matrix are 
studied, such as the generalization of Uncertainty Principle. It will be shown 
that the equivalent relation is robust, because the effects of small errors are also 
trivial. 

In Section 777, some nice properties of the spectra of MUB will be studied. 
First, the original definition of "stationary" will be extremely extended, and 
it's interesting to see that any discrete random signals can have all kinds of 
"stationary" versions of them. Then, the relationship between related random 
sources and independent random sources will be presented, it will be shown that 
treating normal random sources as a bunch of independent random sources will 
bring a lot of convenience. Of course, MUB is the key tool. The third part of 
this section is going to use the nice properties of MUB to do complete analysis 
for random operators/fielters. This part will introduce a general way to do all 
kinds of stabilization for random vectors with some compensations on "white 
noise". At last, a filter which only deal with some designated spectra and left 
others untouched will be presented. 

In Section IV, the MUB spectra for a deterministic vector will be studied. 
In the first part, an algorithm will be shown which tells that any MUB transform 
can be done as fast as DFT when in prime dimensions. Then some properties 
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of the MUB spectra for deterministic vectors will be listed. 

The main application of above results is an simple digital communication 
protocol which can significantly increase the number of users without any ad- 
vanced techniques such as[T3]. This will be introduced in Section V. Maybe 
the theoretical protocol is far from practice, but it provides some deep insights 
about how to encode information into the second moments between random 
variables based on above results. Roughly speaking, communication using the 
first moments of the signals is well studied [15] . while our protocol is based on 
the moment of higher order. When some users are idle, the protocol retrogresses 
simple ones such as " TDMA" /" FDMA" . Based on results of Section III, we 
will introduce some interesting alternations of this model which suggest we can 
do many things based on such model. 

In Section VI, we study the application of signal analysis. Spectra analysis 
for stationary signal is useful and well known [151 IT], while nonstationary case 
are much harder [T51 18] . Using MUB, we suggest complete spectra analysis for 
discrete signals works well in any case. Actually, it suggests that people can 
choose the spectra they are interested in to do the analysis. For instance, 
Fourier spectra analysis also make sense in nonstationary case. We will give a 
example about how to apply it to signal detection. However, we should do more 
about the physical meanings of the nonfouricr spectra of MUB, because they 
are important for practical and mathematical reasons. 

Finally, we will consider the applications of MUB in dimensionality reduc- 
tion. In the case when no prior knowledge of the data is known, we will present 
some local results and a global conjecture. When the prior knowledge is not 
reliable, we suggest that MUB work well. 

We will give some basic notations for the paper. We only work in d dimen- 
sion complex linear space, where the whole d+ 1 mutually unbiased bases(MUB) 
can be found. Assume Mi, Ms, ...Mj+i are the MUB of d dimension complex 
linear space where the columns of Mi form the i'th base. Without loss of gen- 
erality, M\ is the standard base for dimension d complex linear space. For all 
random variables mentioned in this paper, the estimation values of them are 
zero because constant shift is easy to handle. So in the paper, autocorrelation 
matrixes has the same meaning of correlation matrixes. Each vector is a ver- 
tical vector as default. Rx is assumed the autocorrelation matrix of complex 
random vector X — {xi, X2---Xd} T , and tr(Rx) = 1 as default. We say x is 
"white noise" if and only if E(x) = and x is independent to all other random 
variables mentioned in this paper. 

II. THE EQUIVALENCE BETWEEN CORRELATION MATRIXES AND 
THE SPECTRA OF MUB 



Ivanovic first introduced the idea about using the spectra of mutually unbiased 
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bases to reconstruct density matrixes of quantum states [TO]. It's easy to see 
that when apply a unitary matrix U to random vectors, the change of correlation 
matrixes is the same as that for density matrixes when apply U to the quantum 
states. So follow the notations of introduction, we give some basic definitions. 

Definition 2 Let k-Spectrum Sk of Rx be the diagonal part of matrix M? 1 -Rx- 
Mi. And the set {Si, SV-j Sd+i} form the complete spectra of Rx. 

Then we present the following theorem which is the base of this paper. Let Id 
denotes the identity matrix of dimension d, and Diag(V) is a diagonal matrix 
with diagonal part equals V. 

Theorem 1 Each autocorrelation matrix Rx corresponds to a unique set of 
d+l nonnegative real vectors {Si, Sa-.-, Sd+i}, where Sk is the k-Spectrum of 
Rx and for each k, ^2 i= i(Sk)i — 1. {Si, S^---, Sd+l} can reconstruct Rx by 



But the inverse is not right, i.e there are some set of d + 1 nonnegative real 
vectors {Vi, V?,..., Vd+i} satisfies for each k, ^2i = i(Vk)i = 1, but they can't form 
the complete spectra of any autocorrelation matrix. 

Proof. The first part of the theorem is finished by jTD] . where we only need 
to switch "density matrixes" to "autocorrelation matrixes". And it's easy to 
find a counterexample for the second part. Let Vi is a zero vector except the 
i'th term which is 1, for i = 1,2..., d. Then no matter how we choose Vd+i , 
{Vi, Vi-.., Vd+i\ can't form the spectra of some autocorrelation matrix. □ 

A trivial observation is that many different real nonnegative vectors {Si, S<x-.., Sd+i} 
can construct the same Rx useQ. The next theorem says that it's not interest- 
ing except for some constant global shifts to the spectrum. So as default, in the 
next, we will use definition 1 to define the spectra of MUB. Let One denotes a 
d length vector with all term 1 

Theorem 2 Nonnegative real vectors {Si, 1S2..., Sd+i} and {S[, S' 2 ..., S^ +1 } can 
construct the same Rx wse|7p only if for each i — 1,2, d + 1, there exists a 
real number Ui, s.t Si = S[ + Ui ■ One . 

Proof. Assume Sk is the fc-spectrum of Rx by definition 1, and: 



d+l 




(1) 




(2) 
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For each i ^ j, we can check that the diagonal part of Mj 1 ■MfDiag(S' i )-Ml I -Mj 
is Uj i ■ Id, where real number. This finishes the proof. □ 



In theorem 1, we have shown that not all kinds of sets of positive vectors can 
form a autocorrelation matrix. So what kinds of vectors can form the complete 
spectra is an interesting question. Two theorems will be presented about this 
subject and will be used in next sections. 

Theorem 3 Let tr(Rx) — 1, and {S±, 5*2. .., Sd+i} form the complete spectra 
of a autocorrelation matrix Rx, then {S%, S2---, Sd, F} also form the complete 
spectra of anther autocorrelation matrix Rx' , where F equals — ■ One. 

Proof. In jTUj, the author shows that if {Si, Sa-.., Sd+i} form the complete 
spectra of a autocorrelation matrix Rx, then Rx = Y^i=i ' Diag(Si) ■ Mf 1 — I. 
He also shows that Yli=i M% ■ diag(Si) ■ M* 1 — ■ I is also a autocorrelation 
matrix Rx' . This finishes the the proof. □ 



The next theorem is the "uncertainty principle" of the complete spectra. 
Theorem 4 Let tr(Rx) = 1, rrii denotes the max value of Si, then: 

rrij < V2 • \/l - mi + -,i ^ j (3) 

Proof. Without loss of generality, we assume j = 2 and i = 1. Let Dm(Rx) 
denotes the matrix with diagonal part equaling the diagonal part of Rx and other 
terms equaling 0. Let Dv(Rx) denotes the vector which equals the diagonal part 
of Rx. 

S 2 = Dv{Mi-R x -M 2 ) (4) 
= Dv{M% ■ Dm(R x ) ■ M 2 ) + Dv{M 2 H ■ (Rx - Dm(R x )) ■ M 2 ) (5) 

= i • One + DviMi 1 ■ (Rx - Dm(R x )) ■ M 2 ) (6) 

One denotes a d length vector with each term equals 1. Assume Dv(Rx) = 
[di, d 2 , ...dd] T , and d\ — m\. Because of cauchy-schwarz inequality, we have 
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{Rx)i,j < y/di ■ dj. For matrix M, let (abs(M)) it j = |(M) iiJ -|, then: 

max {Dv{M 2 H ■ (Rx - Dm(R x )) ■ M 2 )) (7) 

< max(Dv(abs(M? ) • abs((Rx - Dm(R x ))) ■ abs(M 2 ))) (8) 

< \ E V^d" (9) 

< ~ d W{d - 1) • a\ ■ (1 - a\) + y/(d - 2) • d 2 • (1 - d 1 - d 2 ) + (10) 
... + Vdd-i • d d ) (11) 

< ? . _ di( V(d - 1) • di + V(d - 2) • d 2 + ... + v^i) (12) 

< ? • v 7 ! - di ■ Vd-l + d-2 + ... + l (13) 

< 72-^1-^1 (14) 
We get (10), (12) from cauchy-schwarz inequality. □ 



The following theorem is about the sensitivity of the equivalence between 
the two representations of the autocorrelation matrixes. We consider the cases 
of random error and deterministic error. The proof is trivial, and omitted here. 

Theorem 5 Let {Si, S 2 ..., Sd+i} is the complete spectra of Rx, and Er is a 
error matrix , E$i is a error vector. Assume that Rx + Er is also positive 
and {Si + E S i,S 2 + E S2 ...,Sd+\ + ^s(d+i)} * s a ^ so the complete spectra of a 
autocorrelation matrix. 

(i) If Er is deterministic error matrix of Rx satisfies |-Er|ob < e, then the 
complete spectra of Er + Rx is {Si + Esi, S 2 + Es 2 .-., Sd+i + Esu+i)}> satisfies 
\E S i\oc < d ■ e, for i = 1,2, ...d+ 1. 

(ii) If En is random error matrix of Rx satisfies each term of Er are 
independent, E((E R )ij) = {), and E(((E R )ij)) 2 < e for all i,j = 1,2, ...d. 
Then the complete spectra of Er + Rx is {Si+Egi, S 2 + Es 2 ..., Sd+i + Es(d+i)}, 
satisfies E((E Si )j) — 0, E(((E Si )j)) 2 < e, for i,j = 1,2, ...d + 1. 

(Hi) If for each i, E$i is deterministic error vector of Si satisfies \Esi\a* < e, 
then {Si+Esi, S 2 +Es 2 ..., Sd+i+E S ( d +i)} form the complete spectra of Er+Rx 
, where \Er\os < n ■ e. 

(iv)If for each i, E$i is random error vector of Si satisfies (Esi)j are 
independent for each i = l,2...c£+ 1 and j = 1,2, ...,d, and E((Esi)j) = 
0,E(((E St )j)) 2 < e. Then {Si + E S i,S 2 + E S2 ...,S d +i + £s(d+i)} form the 
complete spectra of Er + Rx , where E((E R ) it j) = 0, E(((E R )ij)) 2 < e. 

III. THINGS BECOMES CLEAR WHEN MUB COMES 
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A. the Generalization of the Definition of Stationary 

Stationary random signal is easy in the sense that we can apply Fourier spec- 
tra analysis. But things become much harder when the signal is nonstationary. In 
this subsection, the definition of stationary random vector is extremely extended by 
MUB. This extension is serious, because it concerns which domains we should con- 
cern to do complete signal analysis. In this subsection, X, X' are two random com- 
plex vectors, Rx and Rx' are autocorrelation matrixes of X, X' , and{Si, 5*2. .., Sa+i} 
a,nd{S[, S 2 ..., S' d+l }are the complete spectra of Rx, Rx' . F also equals 1/n • One. 

Definition 3 X is [ii,i2, ...ife]- stationary if and only if Si ± = Si 2 = ... = Si k = 
tr(Rx) ■ F 

Definition 4 X' is [ii, «2, ...ifc]- stabilizer of X if and only if = SL = ••• = S' ik = 
tr(Rx) ■ F, and Sj = Sj for each j £ {ii,l2.-ik} 

Proposition I Every X can have all kinds of stabilizer because of theorem 3. 

One should notice that "stabilization" is an information lossing process. And 
[ii, l2, ...ih] stabilizer of X will left the information of j spectrum of X unchanged, 
when j £ {ii, 12, ifc}. However, it will be shown that this process can protect the 
information of some spectra. And a general way to stabilize signals will be presented. 

There are two interesting propositions which concerns some traditional important 
properties of random vector. 

Proposition II If Mi is the Fourier base, X is "stationary" (in original sense) if 
and only if X is [1, 3, 4..., d + 1] stationary. 

Proposition III~X. is "white noise" (in original sense) if and only if X is [1,2,..., d+1] 
stationary. 

B. Correlation and Independent 

In this part, some relationships between normal related random sources and 
independent random sources will be presented . Let m(Rx) denotes the minimum 
eigenvalue of Rx, m s (i) denote the minimum term of Si. We first give the main 
theorem of this subsection. 

Theorem 6 If Rx is a autocorrelation matrix with complete spectra {Si, Sz..., Sd+i}, 
and tr(Rx) — 1. If : 

m,(i) > — J— ,i= 1,2,..., d+1 (15) 
n + 1 

, then we can construct a complex random vector with autocorrelation matrix Rx by 
d ■ (d + 1) independent random variables. 

PROOF. From [TO], Let ,we have: 

d+l 

Rx = Mi ■ diag{S l ) ■ M? - I (16) 

i=l 
d+l 

Rx = J2 M '-( dia 9( S ^-—^- I )- M ^ ( 17 ) 
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If ( 15 1 holds, We can construct d+1 random vectors {Yi, Y2, ...Yd+i}, satisfies {(Yi).;, i = 
1, 2, ...d + l,j = 1, 2, ...d} are independent random variables. For each i,j of available 
values, (Yi)j satisfies: 



E((Yi)i) = (18) 

(19) 
(20) 



E((Y)j) = (5 S ), (19) 



Let: 



d+l 

x = Y; M *- Y > ( 21 ) 

Then autocorrelation matrix of X is Rx. 
□ 



Remark / If m(R) > -^--then (15lholds 



Remark //With theorem 2, one can shows that ( |15| l can be replaced by a weaker 
one: sum of m s (i) is no less than 1. But still a lot of autocorrelation matrixes fail to 
satisfy it. 

Remark II seems a strong constraint, but in the next subsection , we will see 
that in some place it can be overcome easily, while in others, it will lead some natural 
results. 

For convenience, we define: 



Definition 5 X is a k-domain random vector if and only if X = ■ Y , where Y is 
a d dimension random vector satisfies E(Y) = 0, and the terms ofY are independent. 

It will be shown that the alternation between X and {Yi, i = 1, 2..., d+l} is very 
useful in various ares. Generally speaking, X + N can be viewed as composition of 
independent random vectors from different domains, where N is "white noise" with 
E(N ■ N H ) — tr(R x ) ■ I. It should be noticed that TV and fc-domain random vector 
has nothing to do with the r-spectrum except for a global incensement/decrement if 
r ^ k. Or we can think of X is a composition of independent random variables from 
different domains with a denoise procedure in the end. This suggests that we can 
just treat signals as a set different independent signals from different domains, and 
energy distribution on each domain won't change after the composition except for a 
global constant shift. In other words, i-spectrum has nothing to say about the energy 
distribution of the j-spectrum when i 7^ j. 

C. Linear Random Operator and Some Special Kinds of Filters 

In this subsection, we will do something in the taste of signal processing . The 
reader will see that linear operators/filters for random vectors will be demonstrated 
clearly with MUB. And we can judge whether a filter is good in the sense that it only 
do what it should and left other parts untouched. 

A general formulation of linear random operators is a good start point to study 
complete MUB analysis for operators. Reminding that a random variable is "white 
noise" only if it's independent with any other random variables in this paper. 
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Definition 6 Pis a random linear operator for d dimension complex random vector, 
if: 

P{X)=T-X (22) 

Where T is a random d- d matrix . And for each subset Sub x of{(X)i, (X)2, (X)d}, 
each subset Subx of {(T)ij ,i, j = 1,2, ...,d} satisfies: 

Pr{Subx, Subr} = Prsub x ■ Prsub T (23) 

There are some propositions for P, which are trivial but important. 

Proposition /For random vectors X and X' , if Rx = Rx' , then Rp(x) = Rp(x')- 

Proposition //For random vectors X and X' , if E(X' -X ) = 0, then Rp(x+x') — 
Rp(x') + Rp(x)- 

Then we the main theorem of this subsection: 

Theorem 7 For random vector X with tr(R x ) — 1, {Si, S2-; Sd+i} are the complete 
spectra of Rx, {S p i, S P 2---, S p (d+i)} are the complete spectra of Rp^y There exist 
d+l dimension d* (d 2 + d) deterministic real matrixes {Di, D2, ...Dd+i}, such that 
for i = 1,2, ...d+ 1, 

S pl = Di ■ [(Si - ^ ■ Onef, (S 2 - ^ ■ Onef, (S d+1 - ^ ■ Oneff (24) 
One is a d length vector with each term equals 1 

Proof. First assumes that m(Rx) > ^xj, then from theorem 5, There exist d + l 
random vectors {Yi, Y2, ...Yd+i}, satisfies {(Yi)j,i — l,2,...d+ l,j = 1, 2, ...d} are 
independent random variables, E(Yi) = 0, and E((Yi) 2 ) — (Si)j — ^ff. Let 

d+l 

X' = J2 M i ■ Y - ( 25 ) 

i = l 

Then Rx = Rx' . From proposition I, II, 

d + l d 

Rp(x) = Rp { x>) =^2^2Rp(M i .z,) ■ ((Si)j - — ) (26) 
i=i j=i 



Zj is the random vector satisfies E(Zj) = 0, and E((Zj) 2 ) = Sij. ( 26 1 has already 
shown the properties of P can be determined by some deterministic matrixes, but we 
need to go further. 

For each Rp(M t -z the k spectrum is S^ J , so the k spectrum of Rp^x) i s: 

S^EE^Mf^-T^) (27) 
i=l j=l 



So there exists {D\, D2, ...Dd+i} satisfies (24|. 

The second part of the proof will deal with the constraint "m(Rx) > ^xj". Let 
X n = -J= • (X + AT), AT is d length "white noise" with £(iV • 7V H ) = /. Now 
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m(Rx n ) > ^-j- holds. Let Lone denotes a d 2 + d length vector with each term is 1. 
Let S^ k ' denotes the fc-spectrum of Rp(x n ), 



pk 

of Rp( N y Then 



and Spfc corresponds to the fc-spectrum 



J pk 

J pk 

q(N) 
°pk 



1 

d+1 
1 

d+1 
1 

d+1 



From above equations, (241 holds. 



Dk ■ [Si , 8 2 , ■ 



(S P k 



Dk-L 



ONE 



?d+lj 



(28) 

(29) 

(30) 
(31) 
□ 



{Di, D2, ...Dd+i} shows some basic property of P. For example, Let Id denote the 
identity d * d matrix, Id is a d * d matrix with each term equals 1, if there are some 
real numbers /Uj, i = 1, 2, d+1 such that: 

D k = [m ■ Id, /i2 • Id, •••Md+l • Id] (32) 
Then the output of P will be [k] stationary. If 

Dk = [/J-l ■ Id, ^2 ■ Id, fJ-i-l ■ Id, fJ-i ■ Id, fJ-i + 1 • Id, fJ-d+1 ■ Id] (33) 

Then P actually switch the i-spectrum of X to the fc-spectrum of P(X) with a global 
constant increment /decrement . 

If a filter only want to do something about the j-spectrum and keep the informa- 
tion of other spectra unchanged, then it should try to satisfy : 

D k = [Ml ' Id, M2 • Id, ... , Mfe-i ' Id, Mfc ' -?d, Mfc+i • Id, ■•■,Md+i • Id], k / j (34) 

For example, let the matrix T of operator P be a deterministic matrix with the 
form [V, Si(V), S 2 (V), S'd-i(V)] H , while V is a length d vector and Si(V) means 
the vector which left ring shift V i times. This kind of P is well studied(such as 
Winner Filter). We could also say that kind of P is a good 2-spectrum filter if the in 
put signals X are [l,3,4...,d+ 1] stationary, i.e E((X)i ■ (X)f) = F(j — i), where F 
satisfies F(k) = F(-k) H . 

So it's a very interesting question that what kinds of {Di, D2-.., Dd+i} corre- 
sponds to a physical realizable random operator, but this paper can't answer it . 

The second part of this subsection will focus on some special kinds of operators. 

Theorem 8 Fork = 1,2, ...d, there exist a operator Pi lt i 2l ...i k , such that for any input 
X, it will output Xi lt i 2t „,i k + N. Xi 1: i 2t ...i k is the ...ife] stabilizer of X and N 

is "white noise" with E(N ■ N H ) = tr(R x ) ■ ^ • h- 

Proof. Let Ys^ = diag(yY\y^\ ...,y d ^), for each j ^ {ii, 12, ■■■ik}, satisfies 
that Y = {y\ ,j £ {ii, i 2 , ...ifc}, i = l,2...,d} are independent random variables, 
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E(y\ j) ) = and £((y- j) ) 2 ) = 1. Y are independent to X. Let X' is the output of 
■f»i,<2>— »* w ith input X . X' is constructed from the following equations. 

X' = ^ Mj • Ys (j) • Mf ■ X (35) 

Then the i spectrum 5^ of X' is: 

S« = S t +tr(Rx)- d ^^-One,t^{i 1 ,i 2 ,...i k } (36) 

St = tr(Rx)- ^ ^ + 1 -One.e/ae (37) 
This finishes the proof. □ 

It's natural to see that more precise stabilization needs more compensations on 
"white noise". It is also very interesting to study how to lowerbound the "noise 
compensation" of "stabilization". 

Based on above techniques, we can also construct a special kind of filter mentioned 
above, the one that only works on designated spectra. For example, think of the case 
that we only want to do something in the Fourier domain. Based on above technique, 
we can first choose a suitable value for E((y^) 2 ), for i = 1,2, ...,d. Then we output 
A'h-Ys^ -M 2 -X+X, This filter only change the Fourier spectrum with compensation 
on "white noise". 



IV. MUB TRANSFORMATION FOR DETERMINISTIC VECTORS 



From now on, X becomes a deterministic vector of d dimension complex linear space, 
and X's fc-spectrum is denoted by Sk = M k ■ X. When fc is a odd prime number, and 
Mi denotes Id, Mu with k > 1 can be constructed by the formulae |24j : 

. . (fc-2)-(j 2 -j) 

{M k )j, r = W r ^+ i (38) 

27ri 

i is the square root of —1, and W = eT. A trivial observation is that M2 is the 
discrete fourier matrix. The following theorem says that for each k, fc-spectrum Sk of 
X can be found from X nearly as fast as the 2-spectrum which could use FFT. 

Theorem 9 If for any k = 1, 2, ...d + 1, M& is constructed from Tk denotes the 

time needed to compute Sk from X , T' h denotes the time needed to compute X from 
Sk, then Tk < T2 + d ■ T m and T' k < T' 2 + d ■ T m , where T m is the time need complex 
multiplication. 

Proof. Let H k = diag(h[ k) ,h ( 2 k) , ...h ( d k) ), where hf ] = Then : 



M k = H k -M 2 (39) 

X = M k -Sk=H k -M 2 -Sk (40) 

Mf = Mf ■ Hk (41) 

S k = Mk ■ X = M2 ■ Hk ■ X (42) 
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This finishes the proof. 



□ 



Similar to DFT, there are also some interesting properties for the MUB spectra 



of X. 



Theorem 10 For a normalized complex vector X, let rm = \Sj\ m .then the following 
holds: 

' : ■ ■ (43) 



mj < -j= ■ rm + \/l - m 2 ,j i 



Proof. With out loss of generality, assumes |(Si)i| = m».If j ^ i, we have: 



1 



1 

Vd 
1 

Vd 
1 

Vd 



^ + ~-£l(^l 



Vd- 1 

■ mi H —— ■ \/l - mf 

Vd 



1-m? 



It's easy to see that! 46 1 comes from cauchy-schwarz inequality. 



(44) 

(45) 

(46) 
(47) 

□ 



The above theorem can be thought of the generalization of original "uncertainty 
principle" for deterministic vectors. While the next theorem is a positive result about 
the MUB spectra. 

Theorem 11 For any normalized complex vector X , there exists k S [1, d + 1] ,sat- 
isfies ISfcjoc > 

Proof. Let V x is the vector which contradict the theorem, construct a d * d matrix 
A — [Vx, Vx, ...Vx], and B — A- A H . All the d*d matrix forms a d 2 linear space, and 
B is not in the subspace of all diagonal matrixes. It's easy to check B is orthogonal to 
all the non-diagonal matrixes constructed in theorem 3.4 of 24 , which implies there 
are at least d 2 + 1 orthogonal bases for d * d matrixes. □ 



If d is prime and MUB are constructed from numerical analysis suggests 

that complete MUB spectra of X also have many interesting properties similar to the 
spectrum of DFT, such as symmetry of X will leads interesting symmetries for all 
MUB spectra , and ring shifts of X also cause some shifts of all MUB spectra in the 
sense of absolute values. 



V. ENCODE INFORMATION INTO SECOND MOMENTS 
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The main application of above results is a simple digital communication protocol which 
can significantly increase the number of users who can use the channel simultaneously 
and worst case bounded. Although the theoretical protocol is far from practice, it has 
provided some deep insights about how to encode information into the second moments 
between random variables. Based on the results of Section III, we will introduce some 
interesting alternations of the model which suggest we can do many things based on 
such model. 

First we assume that {Ai, A2, A n } are all nodes who want to communicate with 
others. There is only a public discrete complex channel C for them to communicate. 
In the first half of each time interval, C collect a complex message Mest from Ai, sum 
MeSi all to Mes, and send Mes to each Ai in the second half of the interval. 

We assume for every d intervals, C will give an synchronous impulse to each Ai 
which can be distinguished from messages. The abilities of Ai are constraint, they 
can't count the impulses from the start. Actually, the impulses can be thought as 
the frame synchronous signal of the channel, and this model is the base for multiple 
access digital communication 16 , such as TDMA/FDMAI22, 23]. Since the number of 
digital communication users grows fast, scientists invent many advance techniques to 
handle large size systems, such as the one which combine TDMA and FDMA together 
|14| . In this part, we present a easier way to increase the number of users. We will 
also study some interesting alternations of C later. 

We define the protocol is (n, d, m) worst case good on g if there exists a function 
g , such that from the start time when Ai wants to send a k bit message , there only 
needs g(n, d, k) time intervals to make sure that the probability that Aj can get right 
information from Ai is at least 2/3, for each j 7^ i. 

It's easy to see that when n = O(d), the protocol is good because of TDMA 
or FDMA. If we have more users, we can use the idea of arithmetic coding[l3], but 
it's hard to be applied to large system because t times users needs 2* times power 
cost for some users. Now we introduce a easily applied protocol which can square 
the number of users. Actually, it's a protocol which is worst case good on function 
g — 0(d 5 ■ lg(k)) ■ k, and requests at most d 3 times of power cost. Numerical analysis 
suggests that when d = 127 and n = 254, then within 100 * 127 time intervals, Aj can 
get the right information from Ai with high probability. 

The idea of the protocol is very simple. We first assume the messages are all 
positive real numbers . Then we assign each Ai a special range, such as time range 
or frequency range. When Ai want to send some messages, he first flips coins and 
gives some random signs to the messages. After that, Ai sends the message which 
are coded in his designated range. The key is that if X is a composition of random 
vectors {Vk,k £ [l,d+ 1]} from different domains(see definition 6 ), then the energy 
distribution of X on domain k is the same as V% except for a global constant shift. 
For example, we will give the protocol when d — 4, n — 10. 

Assume for i — 1,2, ..,10, the the messages of Ai are two real numbers Mo^ 
and M02 of [0, 1], and he wants to tell others which one is lager. Let Ml, M2,..,M 5 
are the MUB of dimension 4. For Ai, we assign Mp+i j to him, where [x] means the 

integer part of x. To communicate, Ai first compute Mes^ = V M oW . Then for each 
round, Ai flips two coins, and change the sign of Mes^ if he got "heads" at the j'th 
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flipping, j = 1,2. Then he computes Vi by 

Vi = Mji^i j • [Mes^, Mes^', 0, 0] T , i = odd (48) 

Vi = Mji+ij • [0, 0, Mesf\ Mesff, i = even (49) 

(50) 

When each synchronous impulse comes, Ai send Vi one by one to the channel C. 
For Aj, he receives signals one by one from C. Assume the signals in this round form 
a d length complex vector X. For Aj, he needs to keep {(M" ■ X)i\ 2 and \(M" ■ X) 2 \ 2 
for the information of A\, and keep the data for other Ai in a similar way, i 7^ j. Then 
after a 1000 rounds, Aj can tell whether Mo\ is larger than Mo\ correctly with high 
probability. 

In general case, we count the rounds needed for Aj theoretically. We assume Aj 
wants to recover E(\(M[ ■ X)i| 2 ). In the communication , some users of domain k, 
k 7^ 1, may start/stop to sent signals to C. It doesn't matter, because for Aj, they are 
global looked same noise and won't effect the relation between E(\(M[ ■ X)i\ 2 ) and 
E(\(M[ ■ X) 2 \ 2 ). So we only consider the case when the total energy of all domains 
except 1 is upper bounded by K. 

We know that X is constructed from independent random variables {n^} from 
different domains, where denotes the j'th random variable from domain i. Because 
n-p won't effect \(M{ ■ X)i\ 2 for j = 2,3, ...,d, we compute the standard deviation 
a(\(M{ -X)^ 2 ) by: 

a 2 {\(M[ • X)i| 2 ) < 0{E{\(M[-X)tf)) (51) 

< 0( ^■E((nf 1 l) ) 2 )-E((n^) 2 )) (52) 

31 >32,<1^1,<2^1 

< O(S) (53) 



K denotes the total energy from all the domains except 1. Let M(\(M[ ■ X)i\ 2 ) 
denotes the mean value of | (M{ ■ X) 1 1 2 in r rounds. Using chernoff bound, we conclude 
when r — 0(K 2 ■ lg(k)), we have: 

Pro{M(\{M[ ■ X)!\ 2 ) - E(\(M[ ■ X),\ 2 ) > 0{ 1 -)) < 0{\) (54) 



So the probability that all k bits are correct is more than a constant positive 
value. In the worst case, when K — d 2 , we need 0(d 5 ■ lg(k) ■ k) time intervals to make 
sure Aj can receive the right information from Ai with probability larger than |. 

Next we consider the error from quantification. It's easy to check that when the 
error of (X)t is less than e, for i — 1,2, ..,d, then error of \(M[ ■ X)i\ 2 is less than 
0(d 2 ■ e). So if e < the mean error of \{M[ ■ X)\\ 2 from quantification will be less 
than 0(\). For each Ai, he need to quantify the the signal(sent or received) to 0(d ) 
discrete magnitude values and 0(d 3 ) discrete phase values to satisfy e < Jj. 

If only time / frequence resources are allowed, the protocol is just " TDM A /FDM A" . 
When the case that more than one domains from MUB are used, we must bounds the 
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total energy of each domain because it's the "noise" of other domains. There is a trade 
off in this model, when more users work simultaneously, more noise comes, so more 
rounds are needed. But the rounds needed for Ai will be upper bounded by a function 
which only concerns n, d, k. Although each user can choose any time to start or end 
a communication process, a better choice is to choose a time when the energy of his 
designated range is low, which may bring a average optimization to the whole system. 
So when the frequency resource is in shortage, and it's not suitable to apply some 
advanced techniques to the system, it seems a reasonable way to allocate resources to 
great numerous of users, for the reason that it's adaptable, analyzable, and worst case 
bounded. 

Actually, traditional protocols such as "TDMA" are based on the first moments of 
the signal, while the highlight of our protocol is that it can fully utilize the information 
of the second moments of the signals. 

Next we'll focus on some special kinds of channels/filters C based on subsection 
C of Section III. We study how can C process the information of each Ai. 

First, when C has "white noise" N, then N effects all the users equivalently as 
"white noise". 

Second, if C can be described by some deterministic matrixes {D\, D2, D d+1 } 
(See theorem 7), C will do what we claimed in the part following theorem 7. So we 
can choose the domains that have nice properties to realize the protocol. 

Third, follow the idea of theorem 8, C can do something special to Ai. Such as C 
can change the information of Ai without effect others except for some global looked 
same "noise". Actually, C can stabilizes the range designated to Ai so nobody can 
know the information from Ai. 

Compared to traditional protocol, such as "TDMA", C can almost do all the 
job the channel CV of "TDMA" can do. Even more, C also can do things Ct can't 
do, such as C can switch the information from different domains. However, almost 
every special thing C can do will bring "noise". So the question raised before that 
"what kinds of {Z?i,Z?2, ...,-Dd+i} correspond to a physical realizable filter" becomes 
important. 

VI. DISCRETE SIGNAL ANALYSIS WITH MUB 



In subsection B of Section III, the traditional definition of " Stationary" is extremely 
extended by MUB. And subsection C of Section 177 suggests the spectra which are far 
from stationary must implies some nontrivial information in their domains. Actually, 
if we treat discrete signals as a composition of independent signals from different 
domains, then spectra analysis in any domain has its own meaning: the fc-spectrum 
uniquely describe the energy distribution of the fc-domain random vector except for a 
global constant shift. So Fourier spectrum analysis also makes sense when the signal 
is nonstationary. 

Subsection C of Section III gives some ideas about how to construct filters to 
process statistic signals. These filters are different from traditional ones in the sense 
that they must concerns all the spectra which we are interested in. 
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Next, for signal detection, we give a definition regarded to how to judge whether 
a signal is meaningful. 

Definition 7 The k-spectrum entropy of X is defined Ek{X) = Xlf=i(~^fl( tf(Rx) ))> 
the complete entropy of X is defined E C (X) — Y2-=i Ej(X). 

So meaningful signals should has Ec less than d- (d+1)- lg(d). And a signal with 
E2 much less than d ■ lg(d) must implies some important information in the Fourier 
domain, no matter whether the signal is stationary or not. 

However, the most important thing left in this part is how to justify the physical 
meanings of each base. This paper failed to achieve it. Unlike the Fourier base, 
for other bases from MUB, it's looks impossible to correspond them to continuous 
functional transformations when we only use the the construction when d is prime. 
Roughly speaking, the MUB spectra based on the constructions when d is prime 
is very sensitive to d. For instance, when a vector has only a single point in the 
k > 2-spectrum for dimension d, then it will change a lot when consider the d! > d 
dimension's fc-spectrum, and the larger k , the more change. Whatever, the paper 
suggests that if the physical meaning of a base (such as the Fourier base) has been 
found, then do spectra analysis of such base will always make sense. 

To achieve to goal, we need the efforts from various areas. Such as we need 
scientists from the areas of signal processing, physics, and bioinformatics to find some 
physical meanings of spectra which are definitely different from frequency. And we 
also need mathematicians to tell us how to construct MUB which have as many good 
properties as possible (such as the Fourier bases). 

VII. DIMENSIONALITY REDUCTION WITH MUB 



For information lossy data compression such as dimensionality reduction, sometimes 
it's hard to have a good compression ratio when few prior knowledge is known, and 
things become even worse when the data looks like "white noise" 11,9. In this section, 
we claim that Mutually Unbiased Based can do the looks impossible job in some sense. 

In the following, compress X with MUB means choosing a subset Sub m of all 
MUB bases, and find a optimal MUB spectrum of Subm to express X, which need 
only lg(d) bits to denote which base has been chosen. 

Theorem 8 is a technical reason that engineers can choose any unbiased base to 
do data transformation, theorem 9 suggest that not all spectra can look good, and 
theorem 10 makes sure that the worst case won't happen when whole MUB spectra 
are considered. Next, we will do something different. 

Sp denotes the unit sphere of d dimension complex linear space, i.e Sp = {V\ < 
V,V > [2 = 1,V 6 C d }. For any subset Subs P of Sp, V(Subs P ) denotes its standard 
volume metric of d dimension complex sphere |19| . 

A normalized uniform random vector is a good start point to analysis the case 
when no prior knowledge is known. 
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Definition 8 X is a normalized uniform random vector if and only if : 

Pr(X e Sub Sp ) = (55) 

In the following, compressing X with k normalized unitary matrixes {B\, B2, Bk} 
means choosing a optimal spectrum of these bases to express X, which needs only lg(k) 
bits to denote which base has been chosen. First we assume k < d+ 1 bases from MUB 
are chosen, and the max absolute value of X's i-spectrum is m;. Then we arbitrarily 
choose k unitary normalized matrixes Ui, U2, Vk , and let Ui = \Ui ■ X\ce. We often 
wants to find some spectrum with large entry to express X. The following theorem 
justifies that the bases from MUB will do better than any {Ui} locally . 

Theorem 12 When X is a normalized uniform random vector, then: 
Pr(max(mi,m,2, ■■■rrik) > \ - —)> Pr(max(u\,u2, ...u k ) > 



2d+l-2Vd '"' \2d+l~2^fd 

(56) 



Proof. First, a lemma will be shown : 



Lemma 13 IfVi, V2 are two normalized d length complex vectors satisfies \ < Vi, V2 > 
\ < ■ Then for any normalized vector V , if j < V, V\ > \ = \ < V, V2 > \ = C, we 



have 



^2d+l~2Vd ^ 

Proof. There exist some vector normalized W, \ < W, V\ > \ = 0, and V = 
e 101 ■ C ■ Vi + Vl - C 2 W, i is the square root of -l.Then we have: 

C=\<V,V 2 >\ = \^=-C- e lBl + v/l - C 2 - <W,V 2 >\ (58) 
V d 

< C • 4= + \/l-C 2 (59) 
Vd 

From above inequality, we can prove the lemma. □ 



For any vector V and constant C, let : 

De(V , C) = {V : \V\ 2 = 1, I < V, Vo > | > C, V G C} (60) 
If C = an d Vi>Vj are an Y two unequal vectors from MUB, then: 

De(y i ,C)nDe{V j ,C)=H (61) 

So 

Pr(max(m 1 ,m2,...m k )>C) > k-d - V{D ^ Q \ C)) (62) 

V(Sp) 

> Pr(max(u\,U2, ...Uk) (63) 
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□ 



Remark / When d goes to infinity, y 2d+1 d urm ts to %/2. 

Remark II When d goes to infinity, d ■ (d + 1) • V ^(g^j C ' ) ' ) goes to zero when 
(' -II. 

Since Remark II is a negative news for large size data. In this case, we can cut 
the total vector into shorter ones, with the compensation on more bits to denote which 
bases have been used. The next conjecture try to support MUB globally, where rrii,Ui 
has the same meaning. 

Conjecture 1 When X is a normalized uniform random vector, then: 

E(max(mi,m2, ...mt)) > E(max(u\,U2, ...Uk)) (64) 

Numerical analysis by the author strongly support the conjecture. 

When the autocorrelation matrixes Rx of X is known, Principal Component 
Analysis(PCA) [111 [9] really works well. However, it's hard to change the PCA base 
when Rx is changed. It's interesting to consider MUB when Rx is known, and choose 
the unbiased bases following the information of the complete spectra of Rx. As the 
discussion above, we could treat X a bunch of independent random vectors from 
different domains. So engineers only need to choose the bases which have nice spectra 
to get an average optimization. Theorem 5 implies that some inaccuracy about the 
autocorrelation matrixes won't effect much. But theorem 3 says that there must be 
some MUB spectra of Rx looks bad. 



VIII. CONCLUSIONS 



In this paper, we studied the subject about how to analyze, process, and utilize the 
information in the second order moments between random variables. We presented a 
number of applications of this subject. However, many problems remain open, and we 
list some important ones here: 

(i) What about the information in moments of order higher than 2? 

(ii) How do we find MUB when d is not power of prime? In particular, for prime 
dimension d, there are simple formulas to compute MUB and has fast algorithm to do 
transformation, what can we say about the case when d is not prime? 

(iii) What about the physical meaning of the nonfourier bases? 

(iv) What kind of second moments filter(see subsection C of Section are 
physical realizable? 

We should noticed that Symmetric Informationally Complete Sets (SICs) [201 [5] 
can do a similar job. But we don't know whether SICs exists for dimension larger than 
45 complex linear space. It should be interesting to ask which one (SICs or MUB) is 
more fundamental to express discrete statistic signals. 
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